Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Nov 11, 2024

This is using the Task.fuse (see dask/dask#11509) method for the Fused task. In theory this should've reduced complexity quite a bit but I encountered problems since the subgraphs are including dead nodes, i.e. there are expressions in _expr that are no longer reachable if we walk the tasks. The Task.fuse raises in such a case since it requires a subgraph that is uniquely reducing to a single task (I put that condition in to catch stuff like this).

Adding an explicit culling step in advance fixes this so everything is OK. However, where is this redundant node coming from and can we get rid of this??

cc @phofl

Comment on lines 3778 to 3790
internal_tasks[t.key] = t
# The above is ambiguous and we have to cull to get an unambiguous graph
outkey = (self.exprs[0]._name, index)
work = [outkey]
internal_tasks_culled = []
while work:
tkey = work.pop()
if tkey not in internal_tasks:
# External dependency
continue
t = internal_tasks[tkey]
internal_tasks_culled.append(t)
work.extend(t.dependencies)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this shouldn't be necessary if there weren't dead nodes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example with dead nodes? I am curious to see what's going on there

@fjetter
Copy link
Member Author

fjetter commented Nov 11, 2024

Adding an explicit culling step in advance fixes this so everything is OK. However, where is this redundant node coming from and can we get rid of this??

Note: The same is happening in dask/dask when trying to replace the SubgraphCallables :(

@phofl
Copy link
Collaborator

phofl commented Nov 13, 2024

I'ver removed the culling, things should be good to go now

@phofl phofl marked this pull request as ready for review November 13, 2024 16:03
@phofl phofl changed the title Taskspec fuse Use Taskspec fuse implementation Nov 13, 2024
@phofl phofl merged commit bb0aef0 into dask:main Nov 13, 2024
7 checks passed
@phofl
Copy link
Collaborator

phofl commented Nov 13, 2024

thanks

@fjetter fjetter deleted the taskspec_fuse branch November 13, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants